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We present a general principle for estimating a regression function nonparametrically, allowing 
for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both 
the mean and the median regression cases are considered. The method works by first estimating 
the conditional hazard function or conditional survivor function and then integrating. We also 
investigate improved methods that take account of model structure such as independent errors 
and show that such methods can improve performance when the model structure is true. We 
establish the pointwise asymptotic normality of our estimators. 
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1. Introduction 

This paper concerns the nonparametric estimation of a regression fmiction g(x) that 
regresses Y on X = x, where the nonnegative variable Y is subject to various filtering 
schemes and where X is an observed vector of regressors. We consider both the mean and 
the median regression case. A common particular case is the standard censored regression 
model Y = g{X) + e, where X is an observed d-dimcnsional vector of regressors, Y is 
subject to random right censoring and e is an unobserved error satisfying E(e\X) = 0. 
We make two contributions. First, we present a completely nonparametric estimation 
methodology. This is done under more general censoring patterns than in previous papers. 
Second, we assume that the error is independent of the covariate and we show how to 
construct a more efficient estimator that takes account of the common shape. 

Parametric and semiparametric estimators of censored regression models include Heck- 
man [15], Buckley and James [6], Koul, Susarla and Van Ryzin [23], Powell [32-34], Dun- 
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can [9], Fernandez [13], Horowitz [20, 21], Ritov [36], Honore and Powell [19], Buchinsky 
and Hahn [5] and Heuchenne and Van Keilegom [16] . Many of these authors either as- 
sume g{x) = (3^ X or some other parametric form, provide estimates of average derivatives 
only up to an unknown scale or assume that the error distribution is parametric. The 
fully nonparametric g{x) model we consider is important because of the sensitivity of 
the parametric and semiparametric estimators to misspecification of functional form. A 
small number of estimators exist for nonparametric censored regression models, in most 
cases focusing on the standard random censoring model. Dabrowska [8] and Van Keile- 
gom and Veraverbeke [42] proposed nonparametric censored regression estimators based 
on quantile methods. Lewbel and Linton [24] considered the above standard censoring 
model, except that the censoring time C is taken to be a degenerate random variable 
(i.e., it is constant), while Heuchenne and Van Keilegom [17, 18] considered the standard 
model when it is supposed that e is independent of X. 

In this paper, we propose a unified approach to the estimation of the regression function 
from filtered data. Filtering, for example, left truncation or right censoring, means that 
even though some information is available about Y , Y itself is sometimes not observed, 
even though X is observed. It is imperative for us that our estimation principles are 
natural and well known in the simple case of independent identically distributed errors 
with no filtering. Our approach makes use of tools from the field of counting process 
theory; sec [2] and [14]. 

First, we recognize that the generic regression model can be reformulated through 
the counting process N{y) = I{Y < y) such that Y — I{Y > y)dy = yN{dy). The 
advantage of the counting process approach is that it readily lends itself to quite gen- 
eral filtering mechanisms, allowing for complicated left truncation and right censoring 
patterns. 

We reformulate the regression model in terms of a counting process N having stochastic 
intensity function 

X{y)=ax{y)Z{y) 

with respect to the increasing, right-continuous and complete filtration = {X,N{u) \ 
< u <y}. Here, Z{y) = 1 — N{y) and ax{y) is the conditional hazard function of Y 
given that X = x. With these definitions, we have that the conditional mean is given by 

/•oo poo / j*U \ 

g-mn{x) ^ E{Y\X = x) = - j ySx{dy) = J uax{u)exp\- J aa;{v)dvjdu (1) 

and the conditional median is given by 

<?„.cd(x)= 5-1(0.5), (2) 

where the relation between the conditional survival function Sx{-) and the conditional 
hazard function a^^-) is given by 

Sx{y) = exp^- J ax{u)du 
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This connection between the hazard function and the regression function is the basis of 
our estimation. 

For the first contribution of this paper, we consider ax{y) as estimated from a local 
constant least-squares principle or a local linear least-squares principle. Plugging these 
estimators into the expressions (1) and (2) results in, respectively, a local constant gc 
and a local linear estimator of the conditional mean or median. It is important to 
note that in the absence of filtering, the traditional local constant and local linear kernel 
regression estimators are special cases of the estimators gc and g l ■ 

The second contribution of this paper is concerned with the estimation of the functions 
5mn(') and (7med(') when some structure is imposed on the model. If there is a substantial 
level of filtering, then one can envision areas where truncation or censoring imply that 
we do not have local information on the entire shape of the error distribution around 
every x. One can alleviate this by imposing assumptions on the shape of these local error 
distributions. The simplest model assumption in this connection is the multiplicative 
regression model 

Y^g{X)eo. (3) 

where the error term Eq is independent of X and has mean or median equal to one, and 
where g{X) is cither g,„n(^) or 5mcd(^)- Under this model, 

aeo\x = Olo (4) 

for some function ap, where a^^ij, is the conditional hazard function of Eq given that 
X = x. 

If model (3) is true, then it can be used to improve estimation, even in the case without 
filtering; see [38]. Our estimation strategy in this case is sequential. We first obtain the 
unrestricted estimator g(-) = gmn(') or 5mcd(') described above. We then use the relation 

o^x{y) ^ -]-:an{ -^-A (5) 
9{x) \g{x)J 

or, equivalently, ao(w) = g{x)ax{ug[x)) to obtain an estimate for ao(')- We use a min- 
imum chi-squared approach to do this optimally, which involves replacing g{x) by g{x) 
and Q.x{y) by the completely nonparametric estimator ax[y)- Given an estimator of ao(')! 
we then obtain a new estimator of g{x) using the minimum chi-squared approach, again 
based on the relation ax{y) = aoijj / g{x)) / g{x) , but now replacing ao(w) by 6iq{u) and 
oi-x{y) by a.x{y)- We wih argue that our estimator fulfills a local efficiency criterion. Van 
Keilegom and Akritas [41] and Heuchenne and Van Keilegom [17, 18] discuss estimation 
of Sx{y) and E{Y\X — x), respectively, in the additive error model when Y — E{Y\X) is 
independent of X. In the first two papers, Sx{y) or E{Y\X = x), respectively, is written 
as a functional of the error distribution and of the distribution of the covariates. The 
estimator is based on plugging in estimates of these distributions. In the last paper, cen- 
sored observations are replaced by synthetic data points. In all three of these papers, 
efficiency issues arc not discussed and the analysis is restricted to the case of random 
right censoring. 



Nonparametric regression with filtered data 



63 



The outline of the paper is as fohows. In Section 2, we describe the theoretical back- 
ground in terms of the counting process formulation, including the important special case 
of filtered data. In Section 3, we introduce our approach to regression based on filtered 
data in the general situation, where we do not restrict the functional form of the error 
distribution. We present the local constant case in detail; the local linear case is given 
in the Appendix. The more efficient estimator (at least when the assumption is correct) 
based on the assumption on the functional form (assumption (4)) is introduced in Sec- 
tion 4, where we also give its asymptotic distribution. In Section 5, we present a small 
simulation study. In the Appendix, we give the proofs of the main distribution results 
contained in the text. 

2. The counting process framework 

Let {Xi, Yi), « = 1, . . . , n, be n i.i.d. replications of the random vector {X, Y), where the 
response Yi is subject to filtering and therefore possibly unobserved, and the covariate 
Xi = {Xii, . . . , Xid) is completely observed. 

2.1. The unfiltered case 

Define Ni{y) = I{Y, < y) for all y in the support of Yi. Then N = {Ni,...,Nn) is 
an n-dimensional counting process with respect to possibly different, increasing, right- 
continuous, complete filtrations J"^; see [2], page 60. We assume that with respect to the 
filtration, Ni has stochastic intensity 

K{y)=ax,{y)Z^{y), (6) 

where Zi{y) = I{Yi > y) is a predictable process taking values in {0,1}. We have not 
restricted the conditional distribution of Sxi and the functional form of the conditional 
hazard function is likewise unrestricted. With these definitions, Xi is predictable, and 
the processes Mi{y) = Ni{y) — Ai{y) , i = 1, . . . , n, and compensators Ai{y) = Xi{s) ds, 
are square- integrable local martingales on the support of Yi . 

We can allow this extremely general model description since the martingale central 
limit theorem dating back to ReboUedo [35] can be applied in this context; see [2], pages 
82-85. Our framework is sufficiently general to include a number of interdependencies, 
including a variety of time series analyses. 

2.2. The filtered case 

In this section, we follow Andersen [1], page 50. Let Ci{y) be a predictable process taking 
values in {0, 1}, indicating (by the value 1) when the ith individual is at risk. Note that 
the predictability condition of Ci{y) allows it to depend on Xi = {Xn, . . . ^Xid) in every 
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possible way. Let 

Jo 

be the filtered counting process and introduce the filtered filtration = (t{N{s), X, CZ{s); s < 
y). The random intensity process \i is then 

\{y) = axAy)C^{y)Z,{y) 

and the integrated random intensity process is 

JQ JQ JO 

With these definitions, Mi{y) — Ni{y) — Ai{y) is a square- integrable martingale with 
respect to the filtration {J-'y)y>Q. Note that, in the filtered case, Zi{y) = liYi > y) is not 
always observed, but the product {CiZi){y) is always observable. 



3. Estimation under the completely nonparametric 
model 

In this section, local constant and local linear estimators under the general nonparametric 
model are given. These estimators take the local constant and the local linear marker- 
dependent kernel hazard estimators of Nielsen and Linton [31] and Nielsen [30] as their 
starting point. In the special case of no filtering, this results in the convenient property 
that the regression estimator based on the local constant hazard estimator is the well- 
known local constant regression estimator, the Nadaraya- Watson estimator, and the local 
linear hazard estimator results in the local linear regression estimator; see, for example, 
[12]. 

Let K he a d-dimcnsional kernel, /c be a one-dimensional kernel, 6= [hi,..., bd) be 
a d-dimensional bandwidth vector and hhe a one-dimensional bandwidth. For any real 
u and any d-dimensional vector x — (xi, . . . ,2;^), define kh{u) = k{u/h)/h and A'b(x) = 

\b\~^K{x/h), where x/h = {xi/bi, . . . ,Xd/bd) and |6| = 11^=1 ^i- The estimator suggested 
by Nielsen and Linton (1995) is 

«..c(y) = ^, (7) 



where 



1 / Kb{x - X,)kh{y - u) dN,{u), 

n . 

E^^y =n-^J2j ^b{x - Xi)kn{y - u)C,{u)Z,{u) du. 
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This estimator was identified as a local constant least-squares estimator in [30]. The 
super/subscript C stands for local constant smoothing. Below, we will also introduce 
estimators based on local linear smoothing. This will be indicated by a super/subscript 
L in the notation. 

We wish to estimate the conditional integrated hazard Ax{y) = ax{u) du. We could 
just integrate ax,c{y) with respect to y, but a better strategy is to first let the bandwidth 
ft- — > 0, which eliminates redundant smoothing. The resulting estimator is 

Note that Axfi{y) equals the estimator of Ax{y) proposed by Beran [3] and Dabrowska 
[7] in the case of random censoring. We then estimate the conditional survivor function 
Sx{y) by the product limit estimator of Johanscn and Gill [22]; see [2], that is, 

s.Ay)= n {^-KcW} (9) 

0<w<y 

for y < T, where T satisfies assumption (A) below. The local constant estimator of 
gl,,ix)=EiYI{Y<T)\X = x) is 

5c,mn(2;)=-/ ySx^cidy). (10) 
Jo 

A local constant estimator of gmod(a^) = mcd(y |X = x) is given by 

gc,mcd{x) = 5^j^(0.5), 

where for any < p < 1, S^]j{p) = mi{y : Sx.ciy) !i 1 ~ p}- 

Another option would have been to define Sx{y) = e'Kp{— Ax (y)} in the above for- 
mula. The advantage of the weighted product limit estimator is that we arrive at exactly 
the extension of the Kaplan-Meier estimator to filtered data in the absence of covariates 
and at the weighted empirical distribution function [37] in the absence of filtering. As a 
consequence, (10) reduces to the well-known Nadaraya- Watson estimator when T ~ oo 
and when all data are completely observed. 

In a similar way, the local linear estimators of Sx{y), gmni^) gmed{x), denoted 
Sx,L{y), ffi mn(^) ^nd gL,med{x), respectively, can be defined. We refer to the Appendix 
for their precise definitions. 

For the asymptotic properties of the unrestricted estimators 3c^mn(*'') ''^^'^ 5l mn(-^) 
ffmn('^)i need to assume the following for x £ Rx, where Rx is a bounded interval in 
the interior of the support of X. All of our results are stated for the special case of a one- 
dimensional covariate X , d = I. The results can be easily generalized to a multivariate 
setting. 
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(Dl) The derivatives ^ o^^i"'* ^^'^ ""g^"'' exist and arc uniformly continuous in a; G 
i?,Y,we [0,T]. 

(D2) The kernel K is symmetric, continuous and has bounded support. The bandwidth 

b satisfies b—>-0, nb—^oo and nb^ = 0(1). 
(D3) The truncation variable T is such that mix£B^x,u&[o,T] ^x{u) > 0. 
(D4) There exists a continuous function ipx {y) such that 



sup 

velo,T] 



1 " 



sup 



i=l 



where H2{K) = J u'^K{u)du. 
(D5) The derivative ^'^gj^-' exists and is continuous. It holds that 



sup 
2/e[o,T] 



1 " 

- V(X, - x)b-^Kb{x ~ Xi)a{y)ZM - fi2{K) 



dx 



(D6) For A G {C,i}, it holds that 



sup \S,.A{y) - SM\ ^ 0, 
velo,T] 

sup |5:,^(2/)-5,(y)|40. 

yelo,T] 

Here, S'*^^(y) is defined as S^^,yi(?/) in (8), (9), (15) and (16), but with ]V,(y) 
replaced by Aiijj). (An explicit definition of S* (j{y) is also given in the proof of 
Theorem 3.1.) 

These assumptions are rather standard smoothing assumptions. Assumptions (D4)- 
(D6) are low-level assumptions. We chose them instead of high-level assumptions to avoid 
more specific assumptions on the censoring. For the unfiltered case, these assumptions 
are classical smoothing results. For the filtered case, consider first the case of random 
right censoring. Then 

n n 

Kb{x - X,)C^{y)Z,iy) = n-^Y.^b{x - X,)I{Y* > y), 



where Y* is the minimum of the survival time Yi and the censoring time C^, which 
are supposed to be independent of each other given Xi. It is easily seen that the latter 
quantity converges to (fixiy) '■= I{x)P{Y* > y\X = x) uniformly in x G Rx and y G [0,T]. 
Other examples of filtering (including, e.g., left and/or right truncation and/or censoring) 



Nonparametric regression with filtered data 67 

can be handled in a similar way. Assumption (D5) is only needed for the asymptotic result 
based on local constant smoothing and not for local linear smoothing. 

Theorem 3.1. Suppose that assumptions (D1)-(D6) hold. There then exist bounded 
continuous functions /3a md va, A£ {C, L}, such that for all x € 



where 



/O JQ 



/3l{x) = -ti2iK) S^iy) ' dudy, 



vc{^) = ml I r SMdyYdu, 



vl{x) = vcix). 



To be consistent with the theory for kernel regression estimators, it must be that in 
the absence of filtering, 

vc{x)^\\K\\l^, 
fix) 

where cr^(a;) = var[y|X = a;] and f{x) is the covariatc density. Note that 



var[Y\X = x]^2 J uS^{u)du- 5^(u)duj . 
In the absence of filtering, (px{u) — f{x)Sx{u). Therefore, it should be the case that 

r axju) f r ^ 

J Sx{u)\J^' 



Sx{y)dy^ du = 2 iiSx{u) du - [ Sx{u)du] . 



This follows by integration by parts. 

For (?med(a;), it has been shown in [42] that gc, med{x) is asymptotically normal when 
the data are subject to random right censoring. It can be shown that this result continues 
to hold true for general filtering patterns. 

4. Estimation under common shape of the error 
distribution 



Under some circumstances, it may be plausible to assume that the error distribution, 
when adjusted for the mean or the median, is generated by the same underlying shape. 
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If there is a substantial level of filtering, then one can envision areas where truncation or 
censoring imply that we do not have local information on the entire shape of the error 
distribution around every x. One can alleviate this by imposing assumptions on the shape 
of these local error distributions. The simplest assumption in this connection is simply 
that a^g\x does not depend on x, where Eq is the error term in model (3). This is 

aeo\x{u) = ao{u) (11) 

for some ao and all u > 0. If this assumption is true, then it can be used to improve 
estimation, even in the case without filtering, as we now discuss. The notion of efficiency is 
here tied to asymptotic variance, which yields mean-squared error holding bias constant, 
and comes from the classical parametric theory of likelihood. The local likelihood method 
was introduced in [38] and has been applied in many other contexts. Tibshirani [38], 
Chapter 5, presents the justification for the local likelihood method (in the context of 
an exponential family) : the author shows that its asymptotic variance is the same as the 
asymptotic variance of the maximum likelihood estimator (MLE) of a correctly specified 
parametric model at the point of interest using the same number of observations as the 
local likelihood method. This type of result has been shown in other settings, for example, 
Linton and Xiao [28] establish efficiency of a local likelihood estimator in the context of 
nonparametric regression with additive errors. In generalized additive models, Linton 
[25, 26] shows the improvement according to variance obtainable by the local likelihood 
method. 

In what follows, g{x) is either (?mn(a;) or (7mod(a;) and similarly for the estimators of 
9{x)- 



4.1. Oracle estimation of the location g{x) 

First, we note that both the local constant and the local linear kernel estimator of the 
full marker-dependent hazard model have the form 

where A equals C for the local constant case and A equals L for the local linear case. 
Let us suppose that an oracle told us what ao is. We define the local constant estimator 
and the local linear estimators of g based on the assumption (4) to be any minimizer g°j^ 
of the criterion function 

2 

{ax.A{y)y^E^yW{x,y) dxdy, 

where w{x,y) is an appropriate weight function. This is motivated by the theory of 
minimum chi-squared estimation [4] , in which efficiency is achieved by weighting a least- 
squares criterion with the inverse of the asymptotic variance of the unrestricted estimator 



9{x) \g{x) 
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(in this case, ax,Aiy), which has asymptotic variance ax{y) / fxiy), where (fxiu) is the 
probabihty hmit of the exposure E^y). For a fixed a;, this expression is minimized by 
minimizing the pointwise criterion 



{axAv)} ^E^,vw{x,y)dy 



(12) 



with respect to and setting gj^{x) = 9 = argmin^ge ^qo (^i ^'^^ some compact set Q 
not containing 0. This is a nonhnear estimator, not obtainable in closed form. 
Define 

2 



l{e;x) = 



{ax{y)} '^ipxiy)w{x,y)dy 



and let = g(x). 

For the asymptotic result below, we need to assume the following: 

(Al) (i) The weight function w{x, y) is continuous and satisfies 'w{x, y) = for (x, y) (fi 
I and < w{x, y) < a for all (x, y) £ I, where < a < oo and I — {{x,y) :x € 
Rx,Tx < J/ < Tic}, where Tx and Tx are continuous functions and where, as in 
(D1)-(D5), Rx is a boimded interval in the interior of the support of X. 
(ii) There exists a continuous function fxi') with mi(^x,y)ei fxijj) > such that 
the convergence statements in (D4) and (D5) hold with the supremum running 
over {x,y) € I instead of y G [Oi?^]- The function ipx{y) is twice continuously 
diffcrcntiable in y for {x,y) G /. 

(A2) The function ax{y) ~ g{x)^^ao[g{x)^^y] is twice continuously differentiable in 
{x,y) e / and inii^x,y)ei C(x{y) > 0. 

(A3) The probability density functions K and k are symmetric around and have 
support [—1, 1], / uK{u) du = J uk{u) du = 0, / v?K{u) du 7^ 0, / u^k{u) du ^ 0, 
and K and k are twice continuously differentiable. 

(A4) For aU £ > 0, inf|g_eQ|>g \l{0]x) — 1{9q]x)\ > 0, l{9,x) is twice differentiable with 
respect to 6* in a neighborhood of Oq and l"{9o;x) > 0. 

(A5) The bandwidths h and b satisfy — >■ 0, 6 — >■ 0, nhb — > 00, nh^b = 0(1) and nb^ = 
0(1). 

Conditions (A2), (A3) and (A5) are standard smoothing assumptions. Assumption 
(Al) is stated uniformly in x because such a uniform version is required in the later 
Theorems 4.2 and 4.3. 



Theorem 4.1. Suppose that assumptions (Al)-(A5) hold. There then exist bounded 
continuous functions and /3^2; ^ ^ such that for all x € Rx, 



/nbig-Xix) - g{x) - h'Pliix) - b'PUx)) =^ iV(0, v^Ax)), 
where, with sq{u) = 1 + ua'Q(u) / ao{u) , 



v"cix)=gixnK\\ 



9{x) 



9{x) 



^ -7-^ ]Vx{y)w{x,y)dy 
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In the absence of filtering, the optimal estimator of g{x), given the knowledge of ao(-) 
or, equivalently, of the density /e(-) of eo- is the local likelihood estimator that maximizes 



m^J2K,{x^X,)l^lnf,(^^^ -In^j, 



which has score function 



= - ^ E ^^'^ - (^) f (^)) + 1 } ' 



where ei{0) ~ Yi/0. The object s^{u) ~ u{f'^/ f^){u) + 1 is known as the Fisher scale score 
and l2{fe) = j s'ii'^i) f e{u) Au is the corresponding information. One can show that the 
asymptotic variance of this oracle local likelihood estimator is 



f{x)h{Ie 

Supposing that we had fe„ = [n6/(x)/||A' H^J observations from the model Y ^ g{x)e, the 
MLE of 9q = g{x) would have asymptotic variance g{x)'^ / l2{fe)kn- In this sense, the local 
likelihood method has the efficiency of the MLE from a sample of size fc„. 
By Efron and Johnstone [10], we have 

2 



/2(/e)= / fl+«^) feiu)du, 

J \ ao{u)J 



which explains the form of the asymptotic variance above. Suppose that we take 'w{x,y) = 
1, make a change of variables y ^ u = y/g{x) in v^{x) and make use of the fact that, 
under no filtering, (pxiy) = f{x)Sx{y) and so ao[u)Lpx{ug{x)) = f^[u)f[x). Then v°(j[x) = 
(13). This shows that g^(cc) is asymptotically equivalent to the oracle local likelihood 
method, that is, efficient in this sense. 



4.2. Estimation with unknown olq 

For a given (?(•), an estimator of ao can be based on the minimization principle 



ttp ^ = arg mm 
"(■) 



1 



9{x) \g{x) 



{ax Ay)} E^yw{x,y)dxdy, 



where the choice of weighting function is again motivated by efficiency considerations. 
Changing variables y i-}- u = y/g{x), the objective function becomes 



ax,yi(uff(a;)) r^aiu) 

5W 



g{x){ax,Ai^gix))} E^^^g^^-^w{x,ug{x))dxdu 
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'g{x)ax,A{ug{x)) - a{u)Y , w[x,ug{x))AxAu, 

g[x)oL^^A\ug{x)) 

ignoring support considerations. Then, because a does not depend on x, we can replace 
it by the pointwise criteria 

\g{x)a.^^A{ug{x)) ~ af -.'^'"f''-', w{x ,ug{x)) Ax 
g[x)a^^A{ug{x)) 

for each u, whence we obtain the closed form solution 

I Ex,yg{x)Mx,y9ix))dx 



/(^^y3(x)^(^' y9{x))/{g{x)ax,A{y9{x)))) dx ■ 



In practice, one computes ao,A{y) as (14) with g(x) replaced by a preliminary completely 
nonparametric estimator g, that is, 

^^^^y^^ I ^tyS{^)^^^^y9i^))'^^ ^^^^ 

Ii^x,yg{x)^(^^ ygi-x)) / {g{x)axAy9i.x)))) dx ■ 

Let y be a fixed value, that is, such that t < y < T, where r > infa;gij^ and 
sup^g^^ (and where we assume that intx^Rx g{x) > 0). We require the following 
assumptions: 

(Bl) The preliminary estimator g{-) satisfies sup^g^^ Igi^) — gix)\ = Op{{nb)^^/^ x 
(logn)i/2). 

(B2) The function 5(2;) is twice continuously differentiable in a; e Rx and inf r^ gi^) > 
0. 

(B3) The bandwidths h and b satisfy h^O,b^O,nhb^ 00, nb^h = 0(1), nh^ = 0(1) 
and nh?b{\ogn)^^ — > 00. 

Theorem 4.2. Suppose that assumptions (Al)-(A3) and (B1)-(B3) hold. There then 
exist bounded continuous functions bAi and 6^2 , ^ G {C^L} , such that for all t <y <T , 



whe 



nh{ao,A{y) - My) - h^bAiiy) - ^^^2(2/)) =^ N{0, SA{y)), 



sciy)^\\k\\l^^;:7:j^E{E[Ciyg{X))\Y^yg{X),X]fxiygiX))w'iX,yg{X))}, 



■B-{y) 

siiy) = sc{y), 

B°{y) = E[{CZ){yg{X)}w{X, yg{X)}\/ a^{y) 
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Finally, we compute a new estimate of g using the estimate of ao- Specifically, define 
the weighted least-squares objective function 

2 

{ax,A{y)y^E^,vU'{x, y) Ay 

with A = C or y4 = L, and with ao equal to cio,Ci Q;o.l or another estimator of a^. Then 
let 

f--'-P{x)^'ATg min ia{e-x), 

where the argmin runs over a shrinking neighborhood /„(x) of a consistent estimator 
of g{x). In the next theorem, we state that under some conditions on the estimator do, 
we obtain the same variance and bias as in the oracle case. One possibility is to use the 
estimator of g given in Section 3 as preliminary estimator and to base the final estimation 
of g on the method of the above Section 4.1, but replacing the oracle ao by do, a- We 
make use of the following additional assumptions: 

(CI) For a neighborhood J{x) of the closed interval [tx/ g{x),Tx/ g{x)], it holds uni- 
formly for z G J{x) that 

Op((^o,)i), 
Op(<5i,„), 

Op{S2,n) 

for sequences (5o,n, Si_n and 62.71 with 6o^n — o{{\ogn)^^^^h^^^), (5i,„ = o(l), 62.71 = 
o((n6/i)i/2(logn)~i/2), 6o,n62,n = o{l), (5i,„(52,„ = 0(1) and 6o^n6i,n = o{n-^/^). 
(C2) With a bounded function 7(0;), it holds that 

/ [a(y) - aoiy)]po{-^\-^^f^w{x, y) dy - h^-f^x) = 0p((n6)"i/2), 

J Vayx) J 9{xY ax{y) 

where po {u) = ao (w) -I- uaj, (u) . 
(C3) The bandwidths h and h satisfy nhb-^ 00, nh^ = 0(1), nlr' = 0(1) and l/{nb^) = 
0(1). 

These assumptions are rather weak. Assumption (CI) is fulfilled for a standard onc- 
dimensional kernel smoother which fulfills the conditions with (5o.n — (logn)^/'^n~^/'^, 
6i,n = (logri)^/^n~^/^ and 82,71 = (logn)^/^. The assumption is fulfilled under much slower 
rates of convergence. The assumption could be replaced by another type of condition using 
the general approach of Mammen and Nielsen [29] based on cross-validation arguments. 
Assumption (C2) is a standard property of kernel smoothers: kernel smoothers are local 
weighted averages. Integration of the estimator leads to a global weighted average with 
stochastic part of parametric rate n~^/^. Typically, the rate of the bias part does not 
change. 



la{0]x) 



do(z) - ao(z) = 
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Theorem 4.3. Suppose that assumptions (A1)-(A5) and (C1)-(C3) hold. There then 
exist bounded continuous functions j3'^~'^*'^P and -y^-step ^y^^/j ^/jq^ j^q^ qH g Rx, 

- g{x) - b^P^-'''P{x)) iV(0, v^-'^'Pix)), 

where v'^~'^*'^p (x) ^v^{x) =v1{x). 

This shows that the two-step estimator achieves the desired oracle property. 

5. Numerical results 

In this section, we look at the small-sample performance of our estimators. The design 
involves a combination of commonly occurring features in the literature: we take the true 
underlying regression function to be identical to that of Fan and Gijbels [11], but our 
disturbance term has a different distribution and we also consider a different censoring 
mechanism. Thus, 

g^n{x) = 4.5 - 64x^(1 - x)^ - m{x - 0.5)^ 

where Xi ^ U[0, 1], et ^ U[0.5, 1.5], while Xi and Si are independent and E{ei) — 1. The 
censoring time mechanism is independent of the covariate and constructed as follows: 

r if < 0.5, 

* 1 +00 otherwise, 

where ^ Beta{l,3), ^ Beta{l,0.75) and we observe {Y^ A C/,,(5, = 1{Y, < Ui),X,}, 
that is, an example of right censoring. 

We employ two methods of estimation of gmn{X): the simple local constant estima- 
tion of Section 3 and the feasible oracle estimation, as discussed in Section 4.2. For the 
purposes of illustration, we use Silverman's rule of thumb bandwidth and the built-in 
minimization routine based on the golden section search and parabolic interpolation. For 
the more efficient estimator, we note that using the one-dimensional grid search gives a 
very similar estimate. 

We use a sample of size 250 and 15 replications over 200 evenly spaced grids on [0, 1]. In 
this example, approximately 25% of the 200 observations are censored. Figure 1 displays 
the average (over replications) of the two estimates. The true regression function chosen 
possesses a high degree of curvature, with the function increasing less steep to the right of 
0.5 than to the left of 0.5. Both estimates are capable of capturing the basic structure of 
the true curve. The efficient estimate appears to adapt better at both peaks and troughs, 
and the quality of fit declines with the steepness of the true curve. Although it is not 
shown here, the relative performance of the simple local constant estimator improves 
toward the feasible oracle estimates when the true regression function has lower degree 
variation. Figures 2 and 3 are the QQ-plots for the efficient and inefficient estimates. 
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Mean Plot 




Figure 1. Plot of the mean of the estimated regression curve. 



QQ Plot of Standardized Efficient Estimates versus Standard Normal 

4 I , , , , , , , 1 




.41 1 1 1 1 1 , 1 1 

-4-3-2-101234 
Standard Normal Quantiles 

Figure 2. QQ-pIot of standardized efficient estimates versus standard normal. 



respectively (i.e., {g — Eg)/ std{g)). The linear trends in the QQ-plots are distinct with the 
efficient estimates performing a little better away from the sample means. Figure 4 plots 
the interquartile range (divided by 1.3) and the standard deviation (across replications) 
for the efficient estimate against grid points. Performance clearly worsens in the boundary 
region. 
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Since it is widely perceived that the Silverman's rule of thumb bandwidth tends to over- 
smooth, we also performed some experiments with smaller bandwidths. Smaller band- 
width leads to much larger simulation time during optimization, due to higher variance. 
In terms of goodness of fit, it does not make a big difference with the feasible oracle 
estimation. However, the improvement of fit for the simple local constant estimation is 



4p 
3- 




-4 1 1 1 1 1 1 1 1 1 

-4-3-2-1 1 2 3 4 

Figure 3. QQ-plot of standardized inefficient estimates versus standard normal. 




I 1 1 1 < < 1 1 1 1 1 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Figure 4. Normalized interquartile range and standard deviation plots for efficient estimates. 
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more pronounced. In that case, the feasible oracle estimation still performs better than 
the simple estimator, as expected. 

Appendix 

A.l. Local linear estimation 

In this section, we first define the local linear marker-dependent estimator, ax,L{y)^ as 
defined in [30], page If 8, 



where, with w = {wj)'^^l = (x,y) and Wi{u) = {Wij{u)Yjtl = {Xi,u) (to simplify the 
notation, we consider the same kernel and bandwidth for x and y), 




(15) 





{Kb{v)-Kb{v)v^D-^ci} (wei^'^+i). 




(16) 



n 



1 




w 





djk=n-^Yl / Kb{w-W^{u)}{w,-W,j{u)}{wk-W^k{u)}G,{u)Z,{u)du, 




and ci = (cij)^^^ and D = {djk)'^].^^. We then consider the local linear estimator of the 
integrated conditional hazard function, obtained when we undersmooth in the y-direction. 
First, we define the necessary kernel constants: 



n 




n 




n 
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We then get that the local linear estimator of the integrated hazard is 

We estimate correspondingly the conditional sm'vival function Sx (y) and the regression 
functions g^^-,{x) and g^cd{x) by 

Kdv)^ n {^-dAxAw)}, 

0<w<y<T 

Jo 

gL,mcdix) = '5'^,i(0.5). 



A. 2. Proof of results 

We restrict attention in the proofs to the case of local constant smoothing (i.e., when 
A = C). The case of local linear smoothing {A = L) can be considered in a very similar 
way and is therefore omitted. Throughout this section, we use the notation A„ ~ i?„ to 
indicate that An = Bn{l +op(l)). 

First, we state a useful lemma. Its simple proof is omitted. Let h{y—) be the limit from 
the left at y for any cadlag function h. 

Lemma A. 1. Suppose Ai and A2 are cadlag functions. Let Si{y) = Y[.^^y{l~dAi{w)}, 
S2{y)^nu.<y{l-dA,{w)} and 

nt \ Si{y-) 



Then 



dQ{y) = ^^d{A,~A2){y). 



Proof of Theorem 3.1. Define 



^ ^0 22j=iKb[x-Xj)Cj{u)Z. 

Then 

dM,{u) 



Ax,c{y)-Al{y)=Y,Kb{x-Xi)[ 



K,{x-X,)Cj{u)Z,{uy 
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Let S*{y) = nu,<y{l ~ dA* (ui)}. We then divide our analysis into an analysis of the 
variable part 

V,{y) = S,,c{y)-S:iy) (17) 

and of the stable part 

B,,{y)^S;iy)-SAy)- (18) 

Note that VAy) - S*iy)QYiy), where Q^iy) = 5,,c(y)/5*(2/) - 1, B.(y) - 5.(y)0f (y) 
and Qxiv) = Sx{y)/Sx{y) — 1- Using integration by parts, we obtain 

glni^) - 5mn(^) = " f vlKci^^v) ^ S.^] = ^ [Kciv) " S,{y)] dy = V{x) + B(.t), 
Jo Jo 

where V{x) = Vx{y) dy and B(x) = Bx{y) dy. By Lemma A.l, we have 

V{x)= f Sliy) r dQl{u)dy 
Jo Jo 

S:iy) f ^^}^d{AxAu)-Alc{n)}dy 

Jo '-'xK^^J 



Si{u-) ^ 'E;=iM=^-x,)c,{u)z,iu) 



dy 



f2 r K{u)dM.,{u), 



where 



K{u)= r S:iy f:f^'' K ^,ix~X,) ]^ dy 

Ju S*{u-) Y.j=iKb{x- Xj)Cj{u)Zj(u) 

SxAu-) K,{x-X,) f^s;{y)dy. 



S*x {u^) Kbix - X,)C, {u)Z, {u) 7„ 

Let 



V{x)^Y. C K{u)dM,{u), 
/.;(u)^n-^ ^^(^~f-^ rSx{y)dy. 
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Then V{x) = V(a;) +Op(l) and by Nielsen and Linton [31], Proposition 1, (n&)^/^V(x) =^ 
N{0,v{x)), where 



vix)^p lim / hl{uf d{M,{u)) 

i=l •' 

= p lim vnn^^hS^ Ki,[x - [ — r~^l f Sa:{y)dy 



2 



{u)Ci{u)Zi{u) du 



The results given in Theorem 3.1 on the variable part follow from standard martingale 
theory; see, among many others, [31]. 

We now turn to the bias. Using (D4)-(D6), we have 

B{x)= r S^[y) r dQ^[u)dy 



S.iy) r ^^^d{Al{u)~A,{u)}dy 

Jo S,:[U-) 

S*{u-) / V ,Ci{u)Zi{u){ax,{u) - aj;{u)}du 



Kb{x-Xj)Cj{u)Z^{u) 



= -p2{K)h S^J^ |^^+2^^ ^jd.d, + op(6). 

The derivation of the asymptotic theory of the local linear case parallels the local 
constant case. While the variable part has the same asymptotic distribution, the stable 
part changes due to the bias properties of the local linear hazard estimator. By checking 
the derivation of the stable part of the local linear kernel hazard estimation of Nielsen 
[30], page 119, it is easy to see that the stable part of the local linear estimator can be 
written as 



b^ix)^-p2{KW S.iy)J^ gj ' dudy. □ 

Proof of Theorem 4.1. Consistency of 6 follows from condition (Al) and the fact that 

sup\ia^,{e]x)-l{0;x)\^O (19) 
eee 

(see, e.g., [40], Theorem 5.7, page 45). The result (19) follows from assumption (A2) and 
the uniform consistency of ax,ciy)', t^iis is established in [31], Theorem 2. Actually, 

iao{(^;x) - l{9;x) 
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\ 1 \y 

ax,c(y) - -^ao 



n 2 
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{ax,c{y)}~^E^,yW{x, y) dy 

2 

{ax{y)}^^'fix(y)w{x,y)dy 
ax,c{y) - ax{y)?{ax{y)}~^ipxiy)w{x, y) dy 



+ 2 / [ax,ciy) - axiy)] 



r \ 1 ly 



ax[y)-\oLA I 



{(^x{y)} ^(px{y)w{x,y)dy 



x[{&x^c{y)} ^Egy~{ax{y)} ^(px{y)]w{x,y)dy 

and this converges to zero in probability, uniformly in 9 G Q. 

We next establish asymptotic normality. First, we consider the Taylor expansion 

= L ■^) - L i(^o;x) + i0*;x){e-eo) 

where 6* lies between 6 and Oq. We have 
where po(u) = ao{u) + wao(w) and 



(20) 



l''i9;x)^2 I pl\y-\^^^wix,y)dy 



ax,c{y) 



Oix,c{y) - 





12/ 




o\ 




ax,c{y) 


yl 




E'^ 
xy 




1 613 


otx,c{y) 



w{x,y) dy 
w{x,y) dy. 



We first establish the properties of I' {9o;x). Recall from [31] that 



ax.,ciy) ~ (^xiy) 



x,y 



(21) 



where 



1 " f 

Vx,y = -Y, j Kh{x-XMy-y')dM,{y'), 
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1=1 ■ 



Therefore, 



= 2 [po 



y 



1 

""'^ w{x,y)dy 



1 ^x.y 



+ 2J po 
2 / pa 



y 

9{x) J g{xy a^,c{y) 
1 B. 



g{x) J da;,c(y) 
w(a;,y) dy 



y 



^~-^w{x,y)dy 



g{x)} g{xY a^.,c{y) 
1 Vx^y 



w{x,y) dy 



2 / Po 



.g{x)] g{xY a^{y) 
where the last hne foUows from [27], Lerama 3. Consider 



w{x,y)dy, 



Po 



1 V,, 



g{x) J a^{y) 



—w{x,y) dy 



-y2Kb{x-X,) 
n ^ — ' 

i=l 
n „ 

/ hni{x,u)dMi{u) 



Po 



y 



1 1 



g{x) j g{x)^ a^{y) 



kh{y-y')w{x,y)dy 



where hni{x,u) = n ^Kh{x — Xi)po{u/ g{x))w{x,u)/{g{x)'^ax{u)}. By the central limit 
theorem for martingales, one gets (see, e.g., [31], Proposition 1) 



n ^ 

(n6)i/2^ / hm{x,u)dM,{u)^N{0,a^) 

i—l 

n „ 

a^—p lim nb^'^^ / h'^j^{u)d{Mi{u)) 

n „ 

= p lim n-H'S^ Klix - X^) \ 

71— >00 ' ^ J 



Po{u/g{x)) I \n I \r7 I \A 

— .^ w{x, u)axi {u)Ci {u)Z,{u) du 

(^x(u) g(x) 



= \\K\ 



2 f plW/gjx)) 

ax{u)g{xY 



(px{,u)w{x, u) du. 
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Furthermore, 



Po 



1 B 

9{x) J gi^y oi^iy) 



n 



x-X, 



i=l 



Pa 



rkhiy-y') 



9{x) J a:^{y)g{xY 
X [axAy')-a.{y)]C.{y')Z,{y')dy'dy 



and it is easily seen that this can be written as a bias term of order 0{h?) + 0(6^), plus 
a remainder term of order 0p((n6)~^/^). 

Finally, note that for any sequence (5„ ^ 0, we have 



sup |C(0;x)-r(0o;x)|=op(l), 
\e-eo\<K 



(22) 



where 



l"{eo;x)^2 / pI 



u 1 w{x, u) ifxiu) 



du 



and l"{eo-x)>Q. 

From (20), we then obtain 



gUx)~9{x)^~{l"{e^-x)}-X^{e^-x){l+op{l)} 
and the asymptotic distribution follows. 



□ 



Proof of Theorem 4.2. Wc first consider the infeasible estimator dg^(y). Consider 
the following decomposition: 



"o,A(y) 



S Ex,yg(x)'^^'^^y9{x))dx 



Ii^x,yg(x)^i^^ y9{x))/ {g{x)ax,A{y9ix)))) dx 



A^iy) 



(say) 



/i?f,(y)/i?2°,(y)dx 
i°(y) A°(y) /•S°,(2;) 



S°(y) Bo{y)^J B-^{y) 



dx - 



B-{yy 



BiAyf 



dx 



(l + op(l)), 



where A°{y) = E[{CZ){yg{X)}w{X, yg{X)}], B°{y) = E[{C Z){yg{X)}w{X, yg{X)}]/ao[y), 
B°^{y) = E[{CZ){yg{x)}w{x,yg{x)}\X = x] and B^^(y) = ao(y) are the limits of the 
corresponding quantities with hats. 
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Straightforward calculations show that 

n 

i°(y) ^ ^(C,;Z0{2/5(^.)}^{^., y9{X^)} + op{{nh)-^'^) + 0{h^) + 0(62) 

i=l 

and 

+ op((n/i)-i/2) + 0(/i2) + 0(&2). 

Next, we consider the term J B°^{y)B2^{y)B2r^{y)^^ dx. Decomposing ax.A{yg{x)) = 
yg{x)l yg{x) ^2x{y) ^ similar way as above, we obtain, after some calculations, 
that 

BlAvWixiy) 



BUy? 



dx 



1 " 

= + ^^n-i ^ kh{yg{X,) ~ Y,}a{Y,)w{X,,yg{X,)) 

1 " 

- -^n-^y2(aZi}{yg{X.,)}w{X.„yg(X,)} + Op{{nh)-^/^) + 0{h') + 0{b^). 

My) j-{ 

Putting the three terms together, we get that 

1 " 

We now consider the feasible estimator do,A(y)- Write 

1 



ao,A(y) - Ea^ Aiy) = j kh{yg{u) - yg(M)) d(F°{u, v) - F°{u, v)) 

N-{y) 



B"{y) 



(say), 



where F°(m,u) = n-^Y.'l=iI{Xi < u,Yi < v,C^(Yi) = 1) and i^°(u,w) = £;[F°(m,w)]- 
Therefore, 

[ao,A(y) - Eao,A{y)] - [ao,A(2/) - 

(23) 



1 1 



Biy) B°{y) 



N{y) + ^^^[N{y)-N°{y% 
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where B(y) and N{y) are defined by replacing g{-) in the formulas of B°{y) and N°{y) 
by .g(-), and where Edo,A{y) is the expected value of ao^A{y) with g considered as fixed. 
Write 

N°{y) = h-^ j k{z)w{u,v + hz) d{[F° - F°\{u,yg{u) ~ hz)). 

Hence, 

N{y)^N°{y) = 0[h-^ sup \F° (uM) - F° (uM) 

«,|t2-ti|<C(nf))-i/2(logri)i/2 

= Op{h-^n-^/^{nb)-^/'^(logny/'^)^Op{{nh)-^/^), 

provided nh"^ b (log oo. Next, note that N{y) = Op{{nh)-^/'^), B{y) - B°{y) = 
Op((n6)^^/^) =op(l) and hence (23) is op((7i/i)^^/^). Since it can be easily seen that 
EaQ,A{y) ~ BoL^ j^{y) = 0(&^) + 0{h^), it follows that ao.A(2/) and chq ^(y) are asymptot- 
ically equivalent. 

Finally, we consider the calculation of the asymptotic variance of ao,A{y)- 
AsVar(do,A(y)) = ^^;;j-^\a.Y[kn{yg{X) - Y}C{Y)w{X,yg{X))] 



B°{yy 

1 1 kl{yg{x)~t}E[C{t)\Y^t,X = x]dF^{t) 
xw^{x,ygix))dFix){l + o{l)) 
k^u) duE{E[C{ygiX))\Y = yg{X),X] 
X fx{yg{X))w^{X,y9{X))}il + o{l)). □ 



B°{yY 



Proof of Theorem 4.3. Consistency oi 9 = g^ ^^'^^{x) follows similarly as in the proof 
of Theorem 4.1 from condition (Al), (19) and the fact that 

sup |/5(0;x)-f„„(0;x)|4o. (24) 

ee/„(a:) 

Equation (24) follows from assumption (CI) and the uniform consistency of OLx.c{y)] see 
the proof of Theorem 4.1. For the proof of Theorem 4.3, it remains to show for 9q = g{x) 
that for some 7*, 

4((?o;^) =C(f?o;a;) + /iV+op((n6)-i/2)^ (25) 
m-,x)=C^^{e-x)+op{l), (26) 



Nonparametric regression with filtered data 



85 



uniformly for in a neighborhood of Oq. Claim (26) follows immediately from assumption 
(CI). For the proof of (25), note first that 



1 ^^-y I \A 
w(x,y)dy 



-{ao - ao) 



-(ao - ao) 



(Po - Pq) 



y 



1 

"''^ -w{x,y)dy 



do J 6*0 "^,c(y) 



1 E^, 



where po{u) = 6lq{u) + uoLQiu). It follows from (CI) that the second term of the right- 
hand side is of order op(n~^/^). From (CI) and (C2), we get that up to a deterministic 
term of order 0{h^), the third term is also of order op(n~^/^). The first term is equal to 
T„+op(n-2/5), where 



T„ = 2 



iPa-Po) 



y 



1 

^w{x,y) dy. 



For the proof of Theorem 4.3, it remains to show that 

T„ = Op(n-2/5). 

By application of (21), we can write T,, = r„_i + Tn^2, where 

y\i 1 



(27) 



w{x,y)dy, 



Tn,l = 2 J Vx^yipQ - Po) (^J- 

T„.2 = 2jBx,y{po Po){£) ^^^^w{x,y)dy. 

It can be easily checked that T„^2 = op(n~^/'^) (cf. the proof of Theorem 4.1). The 
term T„.i can be decomposed into r„.ii + Tn_i2, where 

n „ n „ 

T'n,ii=X! j hni{x,u)dM^{u), r„,i2 = ^ / g„i (x, u) dil/^ (u) , 



with 



hni{x,u) = -Kb{x- Xi 



gm[x,u) = -Kb{x- Xi) 
n 



y \ I 1 



khiy -u)w{x,y)dy 



(do - ao) 



y 1 1 1 



OojOl ax{y) 



kh{y-u)w{x,y) dy 
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We now show that T„_i2 = op(n~^/^). The claim T„^ii = op(n~^/^) can be shown by 
similar methods. For the proof, we apply [39], Lemma 5.14. This lemma gives a bound 
on the increments of the empirical process applied to function classes that depend on the 
sample size. We apply the lemma with a fixed value of x, conditional on the event that 
the number of values of Xi in the support of Kb is equal to m, where m is of the same 
order as nb. We consider the class of functions g : J{x) — )■ R such that, with a sufficiently 
large constant C, for all z G J{x), \g{z) — ct'o{z)\ < C5i,n and |(?'(2:)| < CS2.n- We apply 
the lemma with a = /3 = 1 and M = C62.n- Wc get that 



sup 



1 1 



-kh{y-u)w{x,y)dy 



dM,{u) 



is of order Op{Sl^^^^Sl^^^{nb)-^/^ + S2,ninb)-^) ^ op{{nb)-^/^). This shows that T„,i2 = 
op(n~^/^) and thus concludes the proof of Theorem 4.3. □ 
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